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1 Introduction: 

The QR algorithm [11} is the standard method for computing the eigenvalues of a 
general dense matrix on a traditional sequential computer. ^Vxth the advent of par- 
allel computers, a variety of parallel eigenvalue algorit hms have been proposed. For 
hermitian matrices, there have been two 'different approaches. In the first approach, 
the matrix is reduced to tridiagonal form as in the QR algorithm. The eigenvalues 
of the tridiagonal matrix axe found using either the divide and conquer method [4], 
or multisection [14]. "The other approach has been' to recognize the inherent paral- 
lelism in the Jacobi method [13,10], which was the st andar d algorithm for the problem 
•before the discovery of the QR algorithm. Many parallel Jacobi methods for hermi- 
tian matrices have been proposed and impl eme nted [2,1,23] and convergence of these 
methods has been proved [8,16,24]. 

For general, non-hermitian matrices, extensions of the divine and conquer or mul- 
tisection algorithms are not known. Many attempts have been made to extend the 
Jacobi method to the general case [22,6,5,25,3] and some of these are suited for par- 
allel implementation [22,6,25]. However, these parallel algorithms do not possess the 
quadratic convergence property typical of the Jacobi method for hermitian matrices. 
There have also been attempts at parallelizing the QR algorithm itself [26j and work 
along these lines may prove to be fruitful 

• In this paper we develop a parallel Jacobi-like algorithm for general complex ma- 
trices based on a method first introduced by Eberlein [5] and prove that the algorithm 
converges quadratically. 

The rest of the pauer is org aniz ed as follows. Li §2 we discuss Jacobi-like methods 
in general. In §3 the new parallel algorithm is described in detail In §4 a proof of ulti- 
mate quadratic convergence for the algorithm is presented. Finally in §5 experimental 
results are presented and analyzed. 


2 Jacobi-Like Methods: 


A Jacobi- like method for reducing a matrix to condensed form perzorms a sequence 
of similarity transformations 

Afe+i — -Mjt k =0,1, 2.... (1) 


where Aq = A. is the given n x n matrix and each of the Aft, k — 0,1,... is identical 

to the uni t ma trix except in the positions (pit, {qk->Pk), (pfeiPfe) ^d ?0* 

The potential for parallel implementation arises as follows. If we have a pair 


transformations 


( 2 ) 


and the indices p fel , ^ , P* , are distinct, the matrices and M * commute. 

Therefore the transformations can be applied in any order with the same enect. In 
particular, thev can he applied in parallel, first to the columns (from the right), and 
then to the rows (from the left). For certain choices of the sequence of pairs (p fc , q k ) 
as many as n/2 transformations can be applied in parallel [2,16]. 

At each iteration, the transformation M k is chosen to annihilate certain elements 
of the matrix A fc . The annihilations proceed in ‘sweeps’, where one sweep consists of 
a sequence of pairs (p fc , qk) in which every off-diagonal pair {(p, q), 1 < p < q < n] 
occurs exactLy once. 
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When A is hermitian, it is possible to choose the transformation M* to be a 
unitary plane rotation that annihilates the (pk,qk) and (qk,Pk) elements of Ak [8]. 
The sequence Ak always converges to a diagonal matrix (in practice), and under 
certain conditions convergence can be proved [8,16,24]. 

If the matrix A is non-hermitian (and in general non- normal), two classes of 
Jacobi-like methods have been proposed. In the first type of algorithm, Mk is re- 
stricted to be a unitary plane rotation and is chosen to annihilate the (qk,Pk) element 
of Ak [6,25,3]. For most matrices y4 0 , the sequence Ak is observed to converge to an 
upper triangular matrix (Schur form). Such methods will be referred to as Schur type 
methods. Some of these are amenable to parallel implementation [6,25]. 

The other class of algorithms diagonalize a diagonalizable matrix using both uni- 
tary and non-unitary transformations [5,21]. Mk is chosen to minimize the magnitude 
of the ( Pk,qk ) and (?jt,P*) elements of A as well as reduce ||A||, the Euclidean (or 
Frobenius) norm of A. (||A|| 2 = J\j |a,;| 2 -) To explain why the norm of A is reduced, 
we recall the following result. 

Lemma 2.1 For any square matrix A, 


inf||Jtf-MM|| J = £|A?| (3) 

1=1 

where M is non-singular and \ are the eigenvalues of A. 

Proof: The proof is due to Mirsky [17]. Let Q T AQ = A + T be the Schur decom- 
position of A. (Here A is a diagonal matrix containing the eigenvalues of A on the 
diagonal, and T is a strictly upper triangular matrix.) Let D = <fia<jr(l , e, c 2 , . . . , c n_1 ) 
where 0 < c < 1 


\\D- l Q T AQD\) 2 = ||A|| 2 + X;|< ii |VW-) 

< l|A|| 2 + e 2 ||T|| 2 . 

Since t can be an arbitrarily small positive number, it follows that the right hand 
side of (3) is less than the left hand side. The opposite' inequality also follows by the 
Schur decomposition of M~ l AM , so (3) is proved. □ 

This result can be used to show that reducing norm of A brings it closer to a 
normal matrix, which can be diagonalized by unitary transformations alone. This 
is also equivalent to reducing the size of elements of the matrix C = A A* — A* A, 
the departure from normality [19]. This type of algorithm will be referred to as a 
norm-reducing method. 

These two classes of methods differ in two important respects. First, the conver- 
gence behavior is markedly different. The norm-reducing methods display ultimate 
quadratic convergence for most matrices, and a proof of quadratic convergence can be 
obtained assuming the matrix is diagonalizable [20]. The Schur methods exhibit only 
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linear convergence in general [6], and no proofs of global convergence or asymptotic 
convergence rates are known for general matrices. 

On the other hand, computing the transformation A/* in the Schur methods re- 
quires knowledge of only the (p*,^), ( qk>Pk ), ( Pk,Pk ) and elements of A, 

whereas for the norm-reducing methods the entire pjt’th and g*’th row and column of 
A are required. Besides being more expensive, this has a more serious consequence 
in that it destroys parallelism. Although the transformations Mk can still be applied 
in parallel as described earlier, the effect is different from applying them sequen- 
tially, since the application of one transformation affects elements used to compute 
another. As a result, it is no longer possible to show that ||A|| is always reduced by 
the transformation [5], and the quadratic convergence proof [20] no longer applies. 

Sameh [22] developed a parallel version of the norm reducing algorithm for real 
matrices [7]. Sameh showed how to compute n/2 transformations that could be 
applied in parallel to a real matrix A and would always result in a reduction of ||A||. 
Sameh also proved that the algorithm converged to a normal matrix. So for real 
matrices, we could use Sameh’s algorithm to reduce the matrix to a normal matrix 
and then diagonalize the resulting matrix by the Jacobi method for normal matrices 
[9] (which uses complex arithmetic, but converges quadratically). However, we do 
not know of any investigation of the rate of convergence of Sameh’s algorithm to a 
normal matrix. 

In this paper we develop a parallel algorithm for general complex matrices but 
take a different approach from that of Sameh, combining the norm-reducing and 
Schur methods. We compute the transformations Mk as in the original norm-reducing 
methods [5,21], except for the following differences: 

1. Instead of minimizing the magnitude of the (p*,?*) and (qk,Pk) elements of A, 
we merely annihilate the ( 9*,p* ) element as in the Schur methods. 

2. We stop when we have reached triangular form, as opposed to diagonal form. 

3. We compute and perform transformations in parallel. 

We will show that parallelism can be achieved without sacrificing the property of 
quadratic convergence. Further, because we only seek triangular form, the method 
does not try to diagonalize some obviously defective matrices unlike the original 
norm-reducing methods (e.g. a matrix in Jordan canonical form). 

3 Description of the Algorithm 

For each similarity transformation M* in (1), there is a pair (pk,qk), p* < qk, that 
identifies the submatrix where Mk differs from the identity. 
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The corresponding elements of A* will be called pivot elements and the pair (p*, q *) 
will be called the pivot of the transformation. In what follows we will omit the 
subscript k when it is clear that only one pivot (p*, qk) = (p, q) is involved. 

Each transformation M is one of the following three types of transformations. 

Unitary Transformation: 

The unitary transformation U = {u,j} with pivot (p, q) has the following structure 
in the positions where it differs from the identity. 


u vv 

Up, 

u 9 p 

u„ 


( cosx — e ,fl sinx 
e~ ,e sin x cos x 


where x and 0 are real. 

The following choice of x and 0 ensures that the ( q,p ) element of A' = U T AU is 
zero [6]. Let 

= (°9» — a pp)> 

= i \J &j>q "1" 4flp,0,p, (4) 


where the sign is chosen to achieve the largest absolute value. Then the parameters 
x and 0 are given by 


tan x = 


2 e ie a 


<tv 


dma 


( 5 ) 


where 0 is chosen to make the value of tan x real. 

In practice we need to bound the angle x to avoid migration of large elements 
from the upper triangle of the matrix to the lower part. We impose the bound 


tanx| < 1. 


(i.e., if |tanx| > 1, we set it to 1.) Not using such a bound causes slowing of 
convergence in the earlier sweeps. This is similar to the bound used by Eberlein [6]. 
Shear Transformation: 

The shear transformation S = {sjj} with pivot (p, q) has the following structure 
in the positions where it differs from the identity. 



— ie ta sinh y 
coshy 



( 6 ) 


where y and a are real. 

Let A' = 5 -1 AS. The parameters y and a are chosen to zero the first order terms 
in d\\A'\\ 7 / dy and 5||A'|| 2 /5or. It can be shown that with this choice, ||A'|| < ||A|| 
[5]. 

Let 

= {l a pil 2 "+■ l a «! 2 + l a ipi 2 + ' ( 7 ) 

j¥p.9 
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dpq — ( a qq °pp)> 
(pq = e 


*qp 


+ c 


*pqi 


(S) 

(9) 

( 10 ) 


°pq ~ ^ \hi a pj a qj a jp a jq)> 

3 = 1 

Note that Cp, is just the (p,q) element of C — AA* — A* A. Then the parameters a 
and y are given by 

a = arg(cp,) - tt/2, (11) 

~~1 c mI 


tanhy = 


( 12 ) 


2(K,| 2 + iu 2 ) + <V 

The effect of this transformation is also to reduce the size of the (p, q ) element of 
C, the departure from normality [5]. Clearly, if Cp, is zero, the above transformation 
reduces to the identity. However, ||y4|| may still be non-normal with C having large 
diagonal elements. The following diagonal transformation reduces the size of the 
diagonal elements of C [21,18]. 

Diagonal Transformation: 

The diagonal transformation D with pivot j is the identity matrix except for the 
yth diagonal element, which is tj. Let 


9 j = j 

h i = M 2 j 


Then choosing 


minimizes the value of 


It can be shown [18] that 


i 3 = 




h 

9j 


il + h?t 7 - 


WDj'ADtf = HAlp - - htf. 


(13) 


Note that this transformation does not affect the diagonal elements of A, and therefore 
reduces the norm of the off-diagonal part of A , in addition to reducing ||.4||. 

In practice we do not wish the matrices D to become too ill-conditioned, so we 
impose the restrictions , 

1/r < tj < t (14) 
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where r is a constant. 

Rotation: A rotation transformation R with pivot (p,</) is the composition of a 
shear and a unitary transformation, SU, each with the same pivot (p, q). The unitary 
transformation U is computed using the elements of the matrix A = S~ l AS. 

Commuting Sequence: A sequence of pairs (p*,g*) €{(p, <j), 1 < P < <7 < **} 
such that Pi,gi,P 2 ,< 72 ...Pmi$m are distinct integers is called a commuting sequence 
of pairs. The methods described in [16,24] allow jus to construct many commuting 
sequences where m is either n/ 2 or (n — l)/2. 

Rotation-set: A rotation set is the composite transformation T 

T — R 1 R 2 R 3 • • • Rmt (15) 

where Rk is a rotation with pivot (pit, qk), and the sequence {(p*,?*)} is a commuting 
sequence. 

Parallel. Rotation Set: In order to apply a rotation set T by computing T~ l AT 
correctly according to the formulas (4) - (12), we would have to compute and apply 
each of the transformations Rk in sequence, since the application of one transforma- 
tion affects the elements needed to compute the next. Instead, we will compute each 
of the transformations R * using the original matrix A. Since the pairs (p*, qk) consist 
of distinct integers, the resulting transformations Rk can all be computed and applied 
in parallel. We will call the resulting composite transformation T a parallel rotation 
set with pivot sequence (pk,qk)- 

T = RiR? . . . Rrn- (16) 

Parallel Ordering: Let O be a sequence (pjt,^jt) of N = n(n — l)/2 pairs such 
that O is a concatenation of s commuting sequences. Further let each pair (p, g), 
1 < P < 9 < n occur exactly once in the sequence 0. Then 0 is called a parallel 
ordering. 

Two examples of parallel orderings are shown in Fig.l. The matrices in the figure 
represent parallel orderings by indicating the commuting sequence to which each pair 
(p, q) belongs. The commuting sequences are numbered in the order they appear in 
the parallel ordering. These two orderings were introduced by Brent and Luk [2] and 
Luk and Park [16] respectively. Many other parallel orderings have been introduced 
as well [24,15]. The modulus ordering will figure in the quadratic convergence proof, 
so we define it now. 

Modulus Ordering: A parallel ordering is called a modulus orderings if the pair 
(p, q) is in the 7(p, g)th rotation set of the ordering, where 

/(p, q) = 1 + (p + q - 3) mod n. (17) 

Sweep: One sweep of a parallel ordering 0 is a transformation composed of s 
parallel rotation sets and n diagonal transformations. Each parallel rotation set has as 
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X 1 5 4 2 3 \ 

x 3 2 5 4 

x 1 4 2 

x 3 5 

x 1 

\ X J 

Brent Luk Ordering 


x 1 2 3 4 5 \ 

x 3 4 5 6 

x 5 6 1 

x 1 2 

x 3 
x / 

Modulus Ordering 


^ Figure 1: Parallel orderings 

its pivot sequence one of the commuting sequences that comprise the parallel ordering 
O. The value of s will be either n or n - 1 depending on the parallel ordering used. 
For example, in the Brent-Luk ordering s = n — 1, whereas in the Modulus ordering 
s = n. A sweep SW is defined as 

SW = fxD&Di . . . r n _i A,-i-D n ( 18 ) 

for the case s = n — 1 and 


SW = f 1 D l f 2 D 2 ...T n D n 

if s = n. Each diagonal transformation D p above has the pivot j p , where {j p ) is a 
permutation of {1 ... n}. 

So at the end of a sweep, every pair (i,j) has been covered by a unitary and 
shear transformation, and every row and column has been covered by a diagonal 
transformation. 

Parallel Norm Reducing Jacobi Method: We continue to perform sweeps 
until the following convergence criterion is met. Let 

Ak = Lk + Hk + Rk 

where Lk is strictly lower triangular, Hk is diagonal and Rk is strictly upper triangular. 
We stop when 

||I»|| < (n 2 /2)e. (19) 

(e is the value of machine precision, and the norm used is the Euclidean norm.) This is 
only one of many possible convergence tests that can be used. The ultimate quadratic 
convergence of the algorithm allows for considerable freedom in this choice. 

Parallel Implementation: The parallel norm reducing Jacobi algorithm can be 
implemented on a square grid of n 2 /4 processors as follows. Each processor holds 
four matrix elements. The n/2 diagonal processors are responsible for computing the 
rotation parameters (elements of Rk). The computation of the Cp^’s and G pq ' s is done 
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} 


with each off-diagonal processor computing the required terms and sending them to 
the appropriate diagonal processor. The rotations are broadcast from the diagonal 
processors to the appropriate off-diagonal processors after which all the processors 
apply the rotations to the elements of the matrix. This constitutes one rotation set 
with the elements a Pk<lk , k = 1 . . .n/2 being the off-diagonal elements residing in the 
diagonal processors. Next the processor in the (1,1) position performs the appropriate 
diagonal transformations. Finally all the processors exchange data using the scheme 
given by Brent and Luk [2], and the above process is repeated n — 1 times. It can 
easily be verified that this implements a sweep as defined in (18). (Other parallel 
orderings [16,24] may be implemented by using an appropriate data exchange step.) 

At the end of each sweep, the convergence test is carried out. The computation 
of Pfc|| can also be done in parallel. It is clear that one sweep of the algorithm 
takes 0(n log n) time when implemented as described above (since accumulating the 
row and column information for each rotation set can be done in O(logn) time). In 
§5 it will be noted that O(logn) sweeps are required for random matrices, giving a 
complexity of 0(n log 2 n) for the parallel algorithm. 

Eigenvectors: The transformations Mb are accumulated in a matrix P. In 
§5 it will be noted that the final matrix A* we obtain on convergence is always 
diagonal. This allows us to read off the eigenvectors from the columns of the matrix 
P. Accumulating the transformations can also be done in parallel along with the 
iteration. 


4 Theorems on Quadratic Convergence: 

4.1 Quadratic Convergence for Diagonalizable Matrices: 

We now show that the parallel norm reducing Jacobi method converges quadratically 
in the later stages of the iteration. Note that this does not say anything about global 
convergence, but only describes the rate of convergence if the algorithm does converge. 
First we introduce the notation. Let A be the diagonalizable matrix we are working 
with and A a diagonal matrix containing the eigenvalues of A on the diagonal. If Z 
is the transformation that diagonalizes A, 

A = Z _1 AZ. (20) 

We divide A into a diagonal part D and an off-diagonal part E. 

A = D + E. 


Let A,- be the eigenvalues of A, let 6 be the minimum separation between distinct 
eigenvalues of A, „ 

6 = min |A,- - Ay|. (21) 
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By the Bauer-Fike theorem [11], for each Xj, there is an element a* * of D, such 
that 

|Ai-«**|<||£||. (22) 

If A is close to a diagonal matrix so that 


ii£||<f. 


then it follows from (22) that each diagonal element a 9? of A is close to exactly one 
eigenvalue A(a„) of A, 

K, - A(«„)| < ||J5||. 


We will say that a„ is affiliated to the eigenvalue A(a„). 
Let 



(23) 


Theorem 1: Let the diagonalizable matrix A = Ai have distinct eigenvalues. Let 
(20) hold. Let An be the matrix obtained after applying one sweep of the parallel 
norm reducing Jacobi algorithm to A i} using any parallel ordering. Let E\ and En 
be the off-diagonal parts of Ai and An respectively. 

Suppose Ai has already been sufficiently diagonalized, so that 


Pi||< 


3 ~ 2n 5 
36 7 2 n 2 ’ 


(24) 


Then, 


II-BnII < /<'i(A,f,n,r)i|jE; 1 || 2 . 


where K% is independent of ||jEt|| and r is the bound (14) on the size of the diagonal 
transformations. 

Theorem 2: Let A = Ai be a diagonalizable matrix. Let (20) hold. Let An be 
the matrix obtained after applying two sweeps of rotation sets to A x using a modulus 
ordering (17). Let E\ and En be defined as in Theorem 1. Suppose A\ has already 
been sufficiently diagonalized, so that 


ll*,ll <(!)’"£■ (25) 

Also let A\ be permuted so that diagonal elements affiliated to equal eigenvalues 
occupy adjacent positions. Then 


||^|l</C 2 (||A|U,n,r)||F; 1 || 2 . 

where Ki is independent of )[jE7i||, and r is the bound (14) on the size of tfie diagonal 
transformations. 
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Notice that in Theorem 2, when we allow multiple eigenvalues, we assume a par- 
ticular parallel ordering while Theorem 1 holds for any parallel ordering. Further, in 
Theorem 1 we can bound the error after one sweep whereas in Theorem 2 we bound 
the error after two sweeps. The assumption that diagonal elements affiliated to equal 
eigenvalues are in adjacent positions can be ensured by an appropriate permutation 
of the matrix during the iteration. 

Outline of proof: The structure of the proof is similar to proofs of quadratic 
convergence of other Jacobi methods [20,9]. the argument is more involved here 
because we have to consider the effect of performing transformations in parallel. The 
outline of the proof is as follows. 

1. It is shown that some of the angles x and y in (5), (12) are small if 

\\E\\ = e<8/8. (26) 

2. It is shown that (26) can be maintained throughout the sweep if ||£o|| is small 
enough. 

3. It is shown that after each unitary transformation, the a pq element is almost 
annihilated in addition to the a qp element which is exactly annihilated. 

4. Bounds for the entire sweep are computed using the above results. 

Notation: Throughout the proof, E will refer to the off diagonal part of the 

matrix A , E r to the off-diagonal part of A r , etc.. 

4.2 Preliminary Lemmas: 

We begin by stating some known results on the structure of an almost diagonal matrix 
and the effect of a similarity transformation on the norm of a matrix. 

If A has been permuted so that diagonal elements affiliated to equal eigenvalues 
occupy adjacent positions on the diagonal, we can partition A as follows. 



Each Ajj has its diagonal element affiliated to equal eigenvalues. 
Lemma 4.2.1: If A is partitioned as in (27) and 

then for y := 1 ... m, 

inyi < - Mil < 
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Proof: Refer to Wilkinson [30]. □ 

Lemma 4.2.2: 

M 2 - IIAII 2 < 4 7 ||i?|| 3 . 

Proof: Refer to Ruhe [20]. □ 

Lemma 4.2.3 Let A' = S~ l AS where S is a shear transformation defined by (6), 
(11) and (12). Let the quantities G pq> ( pq and Cp, be defined by (T)-(10). Let K pq 
be defined by 

= Cp? — (dpq a <jp ~ < ^‘pq a vq)' ( 28 ) 

Then 


= 2 sinh 2 yG pq + 4 cosh 2yIm{K pq e~ xa ) 

+ 2sinh4y(|d p ,p + |^,| 2 ) -4cosh4y/m(d;,^,), (29) 


i 


sinh 2y da 


= -2Re(I<e ,0r ) + 4sinh2y/7n(a‘ p ap 7 e~ 2 * a, ) 
+ 2 cosh 2yRe(d m pq a pq e~ xa - d m pq a qp e' a ). 


and 


c 'po | 2 


d\\A'\\' 


d\\A>\\ 


) 


dy sinh 2y da 
Proof: Refer to Eberlein [5] for (31) and Ruhe [20] for (29) and (30). 


(30) 

(31) 

□ 


4.3 Bounds on angles x and y: 

By the definition (12) it can easily be shown that 

| tanhy| < 1/2. (32) 

The bounds cosh 2 y < 4/3 and sinh 2 y < 1/3 follow. Refer to [5] for details. 

The following Lemma may easily be shown using the method in [8]. 

Lemma 4.3.0: If ||i£|| < | then no rotation set can cause a diagonal element to 
change the eigenvalue it is affiliated to in partition (27). 

Proof: Refer to Forsythe and Henrici [8] and Ruhe [20]. □ 

Therefore we can classify the pivot elements into two sets, depending on whether 
or not the corresponding diagonal elements are affiliated to equal eigenvalues. Refer 
to the partition (27) of A. We define the subset J of the parallel ordering 0 as follows. 

J = {(Pfc> 9*)|^(°pm) 7^ r (33) 
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Lemma 4.3.1: Suppose ||jF|| < S/S. Let the pivot (p,q) € J. Then the angles x 
and y computed by (5) and (12) satisfy, 

| sinh y| < 1.01-®, coshy < ^1 + ^ L01 > 


and 




Proof: Consider d pq defined by (8). Since r € J, 

lap'll — l^pp — — I a pp ~ \ pp\ — I 0 ?? — — 2p|| > —6. 


From the definition of x, (5) 


tanxl = 


2a, 


9P 


Since d max , given by (4) satisfies |d mox | > |rf M |, and |a M | < ||£||, 

|sinx| < |tanx| < ^ -ffi - . 

3 0 

The bounds on y are proved by Ruhe [20] as follows. From the definition (12) of y, 


| tanhy| = 




< I^P9^P9 1 ~f~ \K n \ 


2|^I J 


(34) 


2(|d„l J + |{„l 2 ) + G„ 

where we have used the following relation from [5], 

- |cj = Im(<r pq U - im(/r„ C - io ). (35) 

It can be easily shown using (9) that |£ p ,| < V^2||£’||. By (28) and (10) we get 


\ K pi\ = I £ fl pi a « “ a j>«l 

i/p.9 

< 5 £ l«„f + M 2 + M 2 + |o«p. 

Z J*P.9 , 


(36) 


From this we get |if M | < ||£|| 2 /2. 
Therefore (34) becomes 


,-,,4^)13* HI 


The remaining inequalities follow from cosh y =(1— tanh 2 y)~s and | sinh y| *^=| cosh y tanh y |. 

□ 
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4.4 Effect of one parallel rotation set: 

Consider the parallel rotation set defined in §2 (16). 

T = R 1 R 2 • • • Rm = S 1 U 1 S 2 U 2 • • • SmU m . 


(We have dropped the * notation.) Since the pivots (p r >9r) for the rotation set form 
a commuting sequence, 

T = S 1 S 2 ...S m U l Ui...U m . (37) 


Notation: A = A 0 
For r = 1 . . . m, 


is the matrix to which the rotation set is being applied. 
A r =S- , ...Sf'A 0 S 1 ...S r = {<#} 


(38) 


£. = llftll. 


Further," 


A = U-' . . . Ur'S-' . . . S;'A 0 S, . . . SM ...U m = {iff}’ 


(39) 


We will use the definitions (7)-(10) and (28) for the quantities Cp,,(7 w , etc. The 
use of the superscript (r) or * will denote that the quantity is computed using the 
elements of A r or A respectively. We will need the norm 


n 


ll^llr = X 

*,. 7=1 


Note that for any matrix A , ||4|| < ||>1||e < n||A||. 

Note that = dP) since the pivots (p r , q T ) form a commuting sequence. Sim- 
ilarly = £p°l r - We will drop the superscripts for these quantities, so d prgr = d^ r 
etc. for all r. 

In previous Lemmas we have assumed that e r < S/S. However, e r may increase 
during a sweep. We need to bound the growth of e T during the parallel rotation set 

so that we can then find conditions at beginning of the sweep that will ensure that 

||2?|| < S/S holds throughout the sweep. To do this we will use the || ||e norm defined 
above and bound the growth of ||i? r ||i:. i 
Lemma 4.4.1: If ||.Eo||:e < S/S then 

IlfiJIs < 4.6II&II. (40) 

If A has distinct eigenvalues we get the slightly sharper bound 

Pmlls < 2 .S||£o||. (41) 
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Proof: We first estimate the elements of E m that are not pivot elements in the 
rotation set. We define for all r = 0 ... m, 


Let 

So, 


^ H I k ¥* j and (kj) ^ ( Pt , q r ) for any r. 

kj 


A' = S~ l ...SfM 0 - 


(42) 


A m — • . . S m . 

Consider the element off. It is affected by only one of the shears S r in (42). The 
effect takes the form 


a 'kj = cosh VrC^kj ± ^kj e±t ° T sinh y r . 
where aj^ is some other non-pivot element of A 0 . If 

I cosh y r | < c and | sinh y r | < s for all r, 

K,l < c|«g>l + *|ag?|. 

If is defined analogous to ty r , 

*' = E Kil ^ ZXckl?! + *I«S ) I), 

kj kj 


where the sum taken over all non pivot pairs. But £|a*y| = Elaj^l, therefore 

V < (c + s)$ o- 

By a similar analysis, < (c + s)V. So, 

tfm<(c + s) 2 tf o. (43) 

We now have to consider the pivot elements and aS" r j) r . Consider one pivot 
element with (p, q) — (p r ,q r ). It is affected by only one shear, S = S T . By direct 
calculation 

a w } = 4? + sinh 2yd pq + sinh 2 yf„}. 

and 

a l7 1 = 4? + sinh 2 y d p<> + sinh2 

Therefore 


K’l + l«£ ) l < Kl + l“S?l + + 2| sinh 3 y||{„|. 
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We substitute the value 


t&nhy = d® 

2(W,I* + IU J ) + <?£’ 

for tanhy from (12). (The angle y is computed using the elements of A 0 , hence the 
use of the (0) superscript here.) Also, we write in terms of K$ and use (34). So, 

l«£°l + l a w ) l ^ l a p?l + 1^1 + cosh 2 y j|£ w | + + 2sinh2 ylC P ?l- 

Now we use the bounds c and s for | coshy| and |sinhy|. Also we use |<£ p ,| < |a^| + 
l°J?l which follows from (9), and \d pq \ > 3 <5/4. 

■ l-S’l + |a<"> | < (1 + c 3 + 2^=)(|o<°»| + |«£>|) + . (44) 

Define $ r as the sum of the magnitudes of the off-diagonal pivot elements, 

** = D4r,li + i<,U 

r 

By summing all the equations (44) for ( p , q) = (p r ,q T ), r = 1 . . . m, we get 

4>„ < (1 + c 1 + 2s I )<6 0 + ^ E I |. (45) 

Using (36) we can show that 

D^lsiWspyil- 

r 

Using this, the assumption ||^o||e < 6/8 and adding (43) and (45) we get 

ll-^mllr < (1 + c 2 4- 2s 2 + c 2 / 6 + 2 cs)||E 0 ||e- 

Inserting the bounds from (32) we get (40). If A has distinct eigenvalues we can use 
the bounds from Lemma 4.3.1 to get (4*1). □ 

Let £q denote the value of ||£|| at the beginning of the sweep. Let t be the 
maximum value of ||£|| throughout a sweep of N rotation sets. 

Lemma 4.4.2 If e 0 < 6/8 then, 

e < n(4.5) N £ 0 . 

If A has distinct eigenvalues we get the smaller bound 

£ 4 n(2.8) £q 
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Proof: The diagonal similarity transformation cannot increase the value of ||£|| 
by (13). Further, the unitary transformations cannot increase ||£|| either (this can 
easily be shown using (5) and the fact that the transformations U v are unitary). The 
results then follow from Lemma 4.4.1.. □. 

Now we have bounded the growth of e r . We will use this in §4.5 and §4.6 to show 
that e < <5/8 holds throughout the sweep. For the moment we will continue to assume 
it does. 


The next step in the proof is the crucial step in establishing quadratic conver- 
gence. The unitary transformations are chosen so as to annihilate the pivot elements 
d, rPr . We need to show that in addition to this, the pivot elements a Pr9r are almost 
annihilated. 

Lemma 4.4.3: Let e < 6/ 8. If the pivot pair (p r ,9r) G J , where J is defined by 
(33), 


*Prq 


J< 36 7 2 ^ 


where 7 and A are defined by (23) and (20). 

Proof: Consider one pivot pair (p r ,q T ) = (p,<?). From (2S) we can write 


* _ (Cy<7 Kyq) 

a pq — r 


pq 


(46) 


because the unitary transformation U m is chosen to ensure that a qp = 0. We already 
have the estimates \K n \ < e 7 /2 and > 35/4 (since e < 5/8). We need to estimate 

Cpq- 

First consider We can estimate this from Lemma 4.2.3. Using (29) and the 
relation A r+ i = S~ l A r S r , 


2 sinh 2 y r G% + 4 cosh 2y r Jm{K^t ~ iar ) 

2sinh4y r (|d p ,| 2 + |£ p ,| 2 ) - 4 cosh 4y r Jm (<£,£„). 

Rewrite this as 

“ 4 ( sinh yr(2(|d p J 2 + |^,| 2 ) + Gg ) )-coshy r /m(d^ p ,)) 

+ 2(sinh 4y r - 4sinhy r )(|d p ,| 2 + |£ p ,| 2 ) -f 4(cosh4y r - cosh y r ) I m(d^ p<l ) 
+ 2 sinh 2 y r G% + 4 cosh 2y r Im(K^e~ i0r ) - 4 sinh y r G% 


<9|l^ + il| 2 

dy r 

+ 


The first term is of first order in y r . We estimate it as follows. 

4 (sinh y r (2(|(f P9 | 2 + |£„| 2 ) + <?£>) - , cosh y r Im(d^„)) = 4cosh y r (-|c^ - Im(d^(„)) 

= 4coshy r Im(—IC^e~' ao ) 
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where we have used the value of tanh y r from (12) and (35), noting that y r is computed 
using the elements of Ao. The point to note is that this choice of y r makes the above 
term 0(e 2 ). We can now use the bounds \K$\ < e 2 /2, G$ < t 1 for all r, as well as 
the angle bounds from Lemma 4.3.1. We also use the relation S 2 < 2 ||/ 4 0 || 2 . Finally 
we arrive at the estimate 


Using (30), 


MA 


r +i I 


dy T 


< 27||/loir 


P 


(47) 


1 

sinh 2y r 


da T 


= -2Re(K^e~ tar ) + ismh2ylrn(a’ p a pg e 2,0 ) 
+ 2 cosh 2yRe(d* pq a pq e~' ar - d^a, p e ,0rr ). 


Here all the terms are 0(e 2 ) except the last one. But recall how angle a r is chosen 
(11). With this choice Re(c$e~ iQr ) is zero. Using (28), 


Re(c^e-^) = -Re(d; q a pq e-'°r - d* pq a qp e ia ') + Re(Kffe~ ia ') = 0. 

This gives us a second order estimate for the last term as well. Using the bounds on 
the angles etc. we arrive at 


From Lemma 4.2.3, 


I . 1 apr-nll 2 , 

sinh 2y r da r 


< &\\Ao\\ 7 y 2 . 


( 48 ) 


i a|[^+.ll» i . i 

2 dy r sinh2y da T 

Using (47) and (48), 

l«S?l < 22||>l 0 |p|l. 

Recall that the pivots (p r , q r ) form a commuting sequence. Therefore we can move 
any shear S T to the mth position to get a bound on c^ r . Such a permutation does 
not affect the matrix A m . Therefore 



\c£l\<22\A 0 \\ 2 j; for all r = 1 . . . m. 

Now we have to consider the effect of the unitary transformations. The only one we 
need to consider is JJ m . To see why, consider C r = A r A*-A*A r , and C = A/V - A* A. 

c = u^...u;u;c m u 1 u 2 ...u m , 
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since U*U r — /• Let (p, q) = (p m ,g m ). Since the pivots (p r , q T ) form a commuting 
sequence, the only unitary transformation transformation affecting is U m . Now 
note that U m does not affect the norm of the block 


f C PP ^9 ^ 

\ C 1P C 99 / 

By definition (10) we have the bound |c jV | < e 2 for all j. So we can obtain the 
following estimate for Cp 7 , 

^ < mM?j 2 

Again, since we can have any of the pivots (p r ,9r) in position m by a permutation 
that leaves A invariant, the above bound is true for all pivots (p r ,q r ). 

Using (46), 


*J>r9r I 


Now we use Lemma 4.2.2, 


< M+MU 

_ K,\ 

< ^(23||-4o|| 3 + \\\M\ 2 )^ 

< 32|Md|^. 


*Pr9r 1 


< § (UAH’ + 4 T eV 

<-*4 


where we have used 7 > i and e < 8/8. 


(49) 

□ 


4.5 Proof of Theorem 1: 

Consider a sweep as defined in (18) 


SW = T 1 D 1 T 2 D 2 ...T n D n . 


(50) 


Here we have assumed that the parallel ordering consists of n commuting se- 
quences. The discussion is similar if only n — 1 commuting sequences are involved in 
a sweep, and the bounds derived here apply for that case as well. 

From condition (24) and Lemma 4.4.2 the maximum value e that ||i?|| can take 
during the sweep is 


£ < n(2.8) n 


3 ~ 2n 6 8 

36 7 2 n 2 “ 8’ 
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Now suppose ( h,j ) g P, for any / < v + 1. The pair (it, j) cannot be a pivot 
pair for the current rotation set (i/), so it is either in some Pj for / < v or is not. If 
(k,j) & Pi for any l < i/, we use (61) to get the first term of the maximum in (57). 
Alternatively, if (k,j) € Pi for some l < i/ we get the second term of the maximum in 
(57). The factor r in (57) must be included because of the diagonal transformation, 
which can increase the size of an element by this factor. A similar argument leads to 
(59) if (k,j) e Pi for some / < v + 1. 

Similarly, for iteration (52), we get (60). For (58) the first two terms in the 
maximum are obtained as above. We need to consider the additional possibility that 
(*, j) € P v +i- But the pivot elements after the rotation set are bounded by (49), from 
Lemma 4.4.3, which leads to the third term in (58). □ 

The following Lemma is similar to one proved by Hansen [12], while discussing 
the quadratic convergence of the Jacobi method for symmetric matrices. 

Lemma 4.5.2: A solution (fl„, 0„) to the recurrence 

ft„+i = co max {(ci + c 2 )ft„, ciSl„ + c 2 0„} , 

Qh-i = co max {(cj -f c 2 )0 t ,,c 1 © l/ + c 2 ft„} , 

0i =0, 

where c; > 0 for all i, is given by 

ft* = Co -1 ( c i + c 2 ) l '“ 1 fti for v = 1,2, . . . (62) 

0t- = c£ -1 {(ci + c 2 )*' _1 - Ci -1 }!!! for v = 1,2, ... (63) 

Proof: The proof is by induction. The case v = 1 is easily verified. Assuming 
the propositions (62) and (63) are true for v — 1, 

= fycS max {(c! + c 2 ) 1 ', 

c i(ci + c 2 ) 1 ' -1 + c 2 ((ci + c 2 y~ l — cp 1 )} 

= fijcJ max {(c, + c 2 ) v , (c, + c 2 ) v - cp J c 2 } 

= Hjc^fcx + c 2 )*\ 

and 

0*+i = fljCg max |(cx + c 2 ) {(ci + c 2 )*' -1 — cp 1 } , 
c i {(ci + c 2 y~ l - cp 1 } + c 2 (cx + Cj)"" 1 } , 

- fiicS max {(c x + c 2 p - cp^Cj + c 2 ) , (cj + c 2 ) v - cp 

= ^{(ci + c 2 r-cp. 


□ 
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Now consider the recurrences (57)- (60). We could reduce them to two recurrences 
if we didn’t have the third term in (58). Then if H x = ||£ x |j, the quantities ft 2 „_ x 
and 0 2 „_ x are upper bounds for u>„ and 6 U with <*> = r, c x = 1.06 and c 2 = 4 e/6. To 
take care of the third term in (58) all we have to ensure is that 


_2 

<£ _1 (ci +c 2 ) ,/ - 1 n 1 >36r 7 2 — for all y > 1. 

0 

Since fi„ is monotonically increasing with these values of a (r > 1), we only need to 
ensure this for v = 2. We can use e = n2.8" by Lemma 4.4.2. We get the condition 


l|£ill< 


3 ~ 2n S 
36 7 2 n 2 


which is guaranteed by (24). 

Now consider 6 ny the maximum non pivot element at the end of the sweep. 


«n<e !n .! = T 2 - 1 


( 


106 + t) 


2n— 1 


- l.oe 2 "- 1 






where we have used e < ^/8. Substituting the value = ||jB x || and using e < S/S as 
well as £ = n(2.8) n ||£ x || from from Lemma 4.4.2 we obtain 


*2n— 1 < (3r) 2n — ||£ x || 2 . 

So the non-pivot elements at the end of the sweep are bounded by the above. The 
remaining n off-diagonal pivot elements from the last rotation set are bounded by 
(49) so adding the bounds on these elements and the bound above, we get 

ll^||</^(A, 7t 5,r)||£ 1 || 2 J 

where 

i<i = ^y (3T)gn iig,n 2 . 

□ 


4.6 Proof of Theorem 2: 

If A has multiple eigenvalues we can no longer always use the angle bounds from 
Lemma 4.3.1, since the pivot pairs need not belong to the set J (33). In Theorem 
2 we assume that the parallel ordering used is the modulus ordering (17): We will 
assume that n, the order of the matrix, is even. The proof for odd n is similar. 
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x 7 8 

x 9 

\ x / 

Next n-3 rotation sets 


Figure 2: 


Consider 2n — 3 rotation sets using a modulus ordering 

T 1 D 1 T 2 D 2 . . .T n -\D n -iD n . . . T^n-zDin-z- (64) 

Note that we consider 2n — 3 rotation sets even though one sweep of the modulus 
ordering involves only n rotation sets. We are including some of the rotation sets of 
the next sweep. Fig. 2 illustrates the rotation sets we consider for the case n = 6. 

From (25) and Lemma 4.4.2, e , the maximum value of ||i?|| during the application 
of two sweeps of rotation sets to A x (2n rotation sets) is 


•^(sfE-i 


Notation: A\ is the matrix before applying the two sweeps of rotation sets. For 
v — 1,2,..., 

(65) 

( 66 ) 


A v = DZ x T: l A v , 
A v = A V T V D V , 
■^iz+i = A v . 

We define an ‘antidiagonaT as 


X* = {(k,j)\k+j = v). 


Let P„ be the pivot sequence used by the j/th rotation set. ordering. Then it is easily 
shown that (refer to Fig. 2) 


p* = Xu+2 u Xn+v+2 if V < n, 

Pv = Xv +2 u Xu-n +2 if V > n. 
We partition E u , the off diagonal part of A u as follows. 

Ey = B„ + R u + E„, 
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where 


B. = 

R. = 

{>#}, F. = {/£>} 

ft = { 

>> 

a ki 

0 

k + j < v + 2 
otherwise 

II 

Si 

1. 

>) 

a kj 

0 

k + j = v + 2 
otherwise 

ft = { 

ft 

0 

k + j > v + 2 
otherwise 


E v and E v are similarly partitioned. 

We will prove by induction on v that there exists M u independent of ||i?i|| such 


that 


IlfUb < M.llfi.11 1 


(67) 


For v = 1, ||5i||e = 0 so the proposition is true. We assume it is true for v. 

Case 1: v < n. The partition of E„ into B „ , R v and F„ is shown in Fig. 3. First 
consider the pivots in Xn+i»+2- It is easy to see that these pivots cannot affect the 
elements of B u , since they couple elements of F„ among themselves. Now consider 
the pivots in X*/- 

Consider the transformation (65). Consider an element b k j of B„. It is coupled 
with another element a y by a shear and a unitary transformation. Let the pivot pair 
of the transformation be (p, q). Let x and y be the angles involved in the unitary and 
shear transformations respectively. 

1. — € B v . 

Let b' k j be the value of the element after the shear and unitary transformations, 


< (|cosx||coshy| + |sinx||sinhy|)|6fcj| + 

(| sinx|| coshy| + | cosx|| sinhy|)|6^|. * (68) 
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Using the angle bounds from (32) we get 

K>l < %/3(M + iy). 

After the diagonal transformation we get 

l^il ^ r V / 3(|6*j| + l&fjD- (69) 

2 - a k, ~ fkj € F »- and (p.$) $ J- 

It is evident that in this case k,k > p and j,j < q. Since the matrix has been 
permuted so that diagonal elements affiliated to equal eigenvalues occupy adjacent 
positions on the diagonal, it has the structure (27). Therefore hi lies in one of the 
diagonal blocks Eu in the partition (27), and by Lemma 4.2.1 

\hi\<^e 2 . (70) 

3 - a kj = hi € F„. and (p, 9 ) G J. 

We can use the angle bounds from Lemma 4.3.1 and (68) to show that 

Kjl < 1 . 06 IM + 

Further, since |/jy| < e, 

Wyl < l-06|6*j| + 

After the diagonal transformation 

\hj\ < 7- {1.06|&*j | + — }. (71) 

Adding (69), (70) and (71) for all hj we get a bound for ||i?„||£, 

l|5,llc < T |4.6||B„|| 2 + nVU±S| . 

An identical argument can be used to show a bound for iteration (66), 

115,11c < r|4.6||5,|| E + nV^i±^|. 

< t ! |22||B„|| e + 3„vli±iU . 
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Now we consider i?„ + This includes elements of R v as well as those of B v . But 
using Lemma 4.4.3 and accounting for the diagonal transformation, 

fkj < 367 2 r J j 

Adding the bound for R v that we get from this to the bound on B v , 

II^+iIIe < T 2 {22||B„||e + Qe 2 } , (72) 

where 

Q = ^3n 2 iiy^ + 36nyj . (73) 

Case 2: v > n. Refer to Fig.4. First consider the pivots in x^-n+ 2 - These only 
can couple elements of B u with other elements of B„. Using the angle bounds from 
(32) it is easily seen that this can only increase the size of ||J3„||r: by a factor of 12. 

The analysis for the pivots in x *+2 is identical as that for the case v < n. Taking 
into account the factor of 12 that can arise due to the pivots in x»-n + 2 we get 

l|£*>+llb < 12r a {22||S V ||£ + Qc 2 } . 

Solving this recurrence, 

||£„||s < (264r 2 )*'12r 2 Qe 2 . 

After 2n — 3 rotation sets, B v = E v . We have considered 2n — 3 rotation sets in 
the sweep. We have to consider 3 more to complete two sweeps. In these 3 rotation 
sets, ||£||e can grow by a factor of only (4.6) 3 , by Lemma 4.4.1. Therefore, using the 
fact that ||i?|| < II^He and the value e = n(4.6) 2n ||^i||, 

II-EWII < ^(A, (5,n)||F?i|| 2 , 
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Figure 5: Results for Random matrices. 


where 

I< 2 = (4.6) 3 (3V) 2n 12 n 2 r 2 Q. 

□. 


5 Numerical Results: 

We present results of experiments with the parallel norm reducing algorithm (PNRJ) 
for a variety of test matrices. For the convergence test (19) we use e = 10 -15 , and 
the value r = 10 s is used in (14) to bound the diagonal transformations. In Fig.5 
results are presented for random real matrices. The number of sweeps required for 
convergence (according to the criterion (19)) is shown as a function of log 2 n, where n 
is the order of the matrix. The results for random complex matrices were similar. The 
dotted line is a reference line of slope 2.8. So we can empirically state that algorithm 
PNRJ requires 2.8log 2 n sweeps to converge, for random matrices. 

In Fig.5 we compare the order of convergence of algorithm PNRJ with a Schur 
type Jacobi method, in which the non-unitary transformations are omitted (PSU). 
The latter is very similar to the method proposed by Eberlein [6]. We consider results 
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Figure 6: Order of Convergence (30 x 30 Random Matrix) 

for a 30 x 30 random matrix. We plot the logarithm (base 2) of the norm of the lower 
triangle of the matrix before each sweep (||I/||) against the value after each sweep 
(Ill'll). The slope of the plot indicates the order of convergence. Two reference 
lines of slopes one and two axe also shown (dotted lines). Notice that the order of 
convergence is two for the norm reducing method but only one for the purely unitary 
method. 

The convergence behavior of Jacobi- like methods generally deteriorates as the 
matrix A becomes increasingly non-normal. We generated increasingly non-normal 
matrices using the method of Stewart [25]. These matrices have the form 

A = U{D + qF)U t (74) 

where U is unitary, D = diag(l,2, . . . ,n) and F is a random strictly upper triangular 
matrix. The parameter a controls the non-normality of the matrix A. 

In Table 1 we present results for 24 x 24 Stewart matrices with different values 
of a. The degradation in the convergence rates is present for both PSU and PNRJ, 
but PNRJ degrades more slowly. In fact for a = 8 PSU did not converge. We 
observed ultimate quadratic convergence for PNRJ in all cases, though the onset of 
this behavior is delayed as a increases. 


28 





# sweeps 

a 

PNRJ 

PSU 

1 

8 

17 

2 

9 

28 

4 

12 

36 

8 

17 

00 


Table 1: Performance for 24 x 24 Stewart matrices. 



# sweeps 

n 

PNRJ 

PSU 

8 

12 

20 

12 

23 

47 


Table 2: Performance for Frank matrices. 


Frank matrices B n = {6 P ,} are defined by 


f min (ij) if j > i — 1 
( 0 otherwise 


(75) 


The smaller eigenvalues of these matrices are known to be very ill-conditioned. Results 
for Frank matrices are shown in Table 2. 

In Table 3 we report the accuracy of the computed eigenvalues and eigenvectors. 
Results are shown for all of the matrices considered above. ‘Max Error’ is the maxi- 
mum error in the computed eigenvalues (with the eigenvalues computed by EISPACK 
taken as the true values). P is the matrix of normalized eigenvectors computed by 
PNRJ, V is the matrix of normalized eigenvectors computed by EISPACK, and A is 
the diagonal matrix of eigenvalues computed by PNRJ. 

We first note that the computed normalized eigenvector matrix P has condition 
comparable to the eigenvector matrix V, obtained from EISPACK. Also, the residual 
||AP — PA ||2 is small, indicating that the computed eigenvalue-eigenvector pair is a 
good approximation. The error in the computed eigenvalues degrades with increasing 
non-normality of the matrices, as is to be expected. For the extremely ill-conditioned 
Frank matrix of order 12, the ill-conditioned eigenvalues differ from those computed 
by EISPACK by a large amount. However, it was noticed that the well-conditioned 
eigenvalues were computed with low error. 

Although the unitary transformations (5) were chosen to annihilate only the ele- 
ments a qp in the lower triangle of the matrix, the final matrix was in fact very close to 
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A 

Max Error 

II AP - PAH, 

cond(P) 

cond(V) 

30 x 30 

Random Matrix 

3.55e-14 

4.18e-14 

6.46e+00 

6.86e+00 

24 x 24 

Stewart Matrices 

Q = 1 
a = 2 
a = 4 
a = 8 

1.10e-13 

1.49e-13 

1.03e-12 

1.56e-10 

1.09e-13 

2.00e-13 

1.72e-ll 

2.13e-13 

8.81e+00 

4.80e-f01 

1.13e+03 

5.41e+05 

l.OOe-fOl 

5.64e+01 

1.20e+03 

5.28e+05 

Frank Matrices 
n = 8 
n = 12 

6.06e-ll 

1.64e-06 

1.07e-12 

9.80e-14 

7.89e+03 

1.48e+08 

8.41e+03 

1.49e+08 


Table 3: Accuracy of Computed Eigenvalues and Eigenvectors. 


-f’ .■ 

diagonal in all the cases we triedkKpais is to be expected for diagonalizable matrices 
since the norm-reducing trai^forajations move the matrix toward a normal matrix. 
However, even for nearly, defective matrices, the algorithm actually diagonalizes a 
nearby diagonalizable matrix,. po the final matrix is still diagonal, except that the 
computed eigenvalues have larger errors. Therefore, in this algorithm the eigenvec- 
tors can be read off from the columns of the matrix obtained by accumulating the 
transformation matrices T r . This is of course not so for the Schur type methods. 


6 Discussion and Conclusions: 

We are unable to prove global convergence of algorithm PNRJ. The difficulty comes 
from the fact that we cannot prove that the parallel algorithm always causes a decrease 
in ||j 4||, as well as from the cyclic choice of the pivots. The globed convergence proof 
for Eberlein’s norm reducing method [5] as well as that for Sameh’s parallel algorithm 
for real non-symmetric matrices relies on the provable norm reduction property as 
well as an optimal choice of pivots. 

Optimal pivoting strategies (which require searching the elements of the commu- 
tator C at each iteration) are difficult to implement efficiently and may not be worth 
doing merely to obtain the global convergence proof, when cyclic strategies perform 
just as well in practice. 

Regarding the reduction in ||A||, although we are unable to prove it in general for 
algorithm PNRJ, we observed that in all the experiments described in the previous 
section, the norm ||A|| never increased after a rotation set and invariably decreased 
(except during the last few sweeps, when the matrix was already normal). Therefore 
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we conclude that the question of global convergence of algorithm PNRJ is still open. 

Various implementation issues remain to be investigated in order to improve the 
algorithm further. In particular, the broadcasting of information is an undesirable fea- 
ture because communication of this sort is costly in all parallel computers. However, if 
we consider the proof of quadratic convergence we notice that the global information 
does not play a significant role. Therefore we could consider not computing global 
information during the later iterations which would reduce the cost of computation 
as well as communication in the later stages of the iteration. Experiments along these 
lines are currently in progress. Another issue that needs to be considered is a block 
version of the algorithm similar to block Jacobi methods for the hermitian eigenvalue 
problem [1,24]. 

In spite of using non unitary transformations* PNRJ computes eigenvalues and 
eigenvectors with accuracy comparable to that of EISPACK. This feature is inherited 
from the norm reducing methods on which PNRJ is based [5,21]. To explain the 
stability of these methods, we can argue that the iterates Ak axe moving closer to 
a normal matrix and therefore their eigenproblem is becoming successively better 
conditioned. Further, as long as only small norm similarity transformations are used, 
the final matrix will be exactly similar to a matrix that is close to the original matrix 
[29]. However, a formal error analysis proving the stability of norm-reducing methods 
does not appear to have been carried out, and is an area open for further investigation. 

To compare algorithm PNRJ with the QR algorithm let us consider operation 
counts. The single shift hessenberg QR algorithm for general complex matrices [11] 
computes the Schur decomposition in approximately 26n 3 complex floating point op- 
erations. Here we have used the empirical observation that usually about 3 iterations 
are required to decouple one eigenvalue. On the other hand, on a sequential machine, 
PNRJ requires about 9n 3 operations for one sweep. So for random matrices, PNRJ 
is slower than the QR algorithm by a factor of about log 2 n. However on a parallel 
computer the situation is not so clear since efficient parallel implementations of the 
QR algorithm (like 0(n log 3 n) time using 0(n 3 ) processors) are not known. This is 
an area of ongoing research. 

As described in the previous section, algorithm PNRJ slows down as we move to 
matrices that are increasingly non normal. Although the QR algorithm also displays 
this degradation, the effect of non normality is much less pronounced. Understanding 
the effect of non normality on algorithm PNRJ in more detail is also an important 
topic for research. 

Finally, a version of the algorithm that uses only real arithmetic for real matrices 
is desirable. Real Jacobi-like methods that converge quadratically for certain classes 
of matrices have been investigated by Veselic [27,28]. Sameh’s algorithm [22] is a 
parallel algorithm for real matrices. Work is in progress to develop a quadratically 
convergent parallel algorithm for real matrices. 

To summarize, we have introduced algorithm PNRJ, a parallel Jacobi like algo- 
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rithm that diagonalizes a diagonalizable matrix. It has been proved that the algorithm 
converges quadratically for any parallel ordering of the pivots if the matrix has dis- 
tinct eigenvalues. In the presence of multiple eigenvalues quadratic convergence can 
be shown if a particular parallel ordering, the modulus ordering, is used to choose the 
pivots. 

Experimental results confirm the ultimate quadratic convergence of the algorithm, 
though the onset of quadratic convergence is delayed with increasing non-normality 
of the matrices. For nearly defective matrices the well conditioned eigenvalues are 
computed accurately, though the convergence is also delayed in this case. For random 
matrices the algorithm exhibits an improved rate of convergence compared to some 
other parallel Jacobi methods for the unsymmetric eigenvalue problem. 

Algorithm PNRJ can be implemented using n 2 / 4 processors and performs one 
‘sweep’ in 0(n log n) time. For random matrices, it is empirically observed that 
0(log n) sweeps are necessary for convergence. 

7 Acknowledgements: 

I wish to thank Rob Schreiber for valuable advice and suggestions and for reading 
the manuscript with great care. 


References 

[1] Christian Bischof. Computing the Singular Value Decomposition on a Distributed 
System of Vector Processors. Technical Report 87-869, Department of Computer 
Science, Cornell University, 1987. 

[2] R.P. Brent and F.T. Luk. The solution of singular-value and symmetric eigen- 
value problems on multiprocessor arrays. SIAM J. Sci. Stat. Comp., 6(l):69-84, 
1985. 

[3] R.L. Causey. Computing eigenvalues of non hermitian matrices by methods of 
Jacobi type. J. SIAM , 6:172-181, 1958. 

[4] J.J Dongarra and D.C. Sorensen. A fully parallel algorithm for the symmetric 
eigenvalue problem. SIAM J. Sci. Stat. Comp., 8(2):138-154, 1987. 

[5] P.J. Eberlein. A Jacobi method for the automatic computation of eigenvalues 
and eigenvectors of an arbitrary matrix. J. SIAM, 10:74-88, 1962. 

[6] P.J. Eberlein. On the Schur decomposition of a matrix for parallel computation. 
IEEE Trans. Comp., 36:167-174, 1987. 


32 



[7] P.J. Eberlein and Boothroyd J. Solution to the eigenproblem by a norm-reducing 
Jacobi-type method. Num. Math., 4:24-40, 1968. 

[8] G.E. Forsythe and P. Henrici. The cyclic Jacobi method for computing the 
principal values of a complex matrix. Trans. Amer. Math. Soc., 94:1-23, 1960. 

[9] H.H. Goldstine and Horwitz L.P. A procedure for the diagonalization of normal 
matrices. J. ACM, 6:176-195, 1959. 

[10] H.H. Goldstine, F.J. Murray, and J. yon Neumann. The Jacobi method for real 
symmetric matrices. J. ACM, 6:59-96, 1959. 

[11] G. Golub and C. Van Loan. Matrix Computations. Johns Hopkins University 
Press, 1983. 

[12] E.R. Hansen. On Jacobi methods and block Jacobi methods for computing matrix 
eigenvalues. PhD thesis, Stanford University, 1960. 

[13] C.G.J. Jacobi. Uber ein verfarhen, die in theorie der sakularstorungen vorkom- 
menden gleichungen numerisch aufzulosen. J. Reine Angew. Math., 30:51-95 

1 O AC * 1 » 


[14] Sy-Shin Lo, Bernard Philippe, and Ahmed Sameh. A multiprocessor algorithm 
for the symmetric eigenvalue problem. SIAM J. Sci. Stat. Comp., 8(2):155-165, 
1987. 


[15] F.T. Luk and H.T. Park. On Parallel Jacobi Orderings. Technical Report EE- 
CEG-86-5, School of Electrical Engineering, Cornell University, 1986. 

[16] F.T. Luk and H.T. Park. A Proof of Convergence for two Parallel Jacobi SVD 
Algorithms. Technical Report EE-CEG-86-12, School of Electrical Engineering 
Cornell University, 1986. 

[17] L. Mirsky. On the minimization of matrix norms. Amer. Math. Monthly 65 106- 
107, 1958. 

[18] E.E. Osborne. On pre-conditioning of matrices. J. ACM, 7:338-345, 1960. 

[19] A. Ruhe. The norm of a matrix after a similarity transformation. BIT 9 53-58 
1969. 


[20] A. Ruhe. On the quadratic convergence of a generalization of the Jacobi method 
to arbitrary matrices. BIT, 8:210-231, 1968. 

[21] H. Rutishauser. Une methode pour le calcul des values propres des matrices non 
symetriques. Comptes Rendus, 259:2758, 1964. 


33 


[22] A.H. Sameh. On Jacobi and Jacobi-like algorithms for a parallel computer. 
Math. Comp., 25(115):579-590, 1971. 

[23] R. Schreiber. Solving eigenvalue and singular value problems on an undersized 
systolic array. SIAM J. Sci. Stat. Comp., 7(2):441^151, 1986. 

[24] G. Shroff and R. Schreiber. On the Convergence of the Cyclic Jacobi Method for 
Parallel Block Orderings. Technical Report RPI-CS-88-11, Computer Science 
Dept., Rensselaer Polytechnic Institute, 1988. (To appear in SIAM J. Matrix 
Anal. Appl.). 

[25] G.W. Stewart. A Jacobi-like algorithm for computing the schur decomposition 
of a nonhermitian matrix. SIAM J. Sci. Stat. Comp., 6:853-864, 1985. 

[26] R. van de Geijn. Implementing the QR-Algorithm on an array of processors. 
PhD thesis, University of Maryland, College Park, 1987. 

[27] K. Veselic. On a class of Jacobi-like procedures for diagonalizing arbitrary real 
matrices. Numer. Math., 33:157-172, 1979. 

[28] K. Veselic. A quadratically convergence Jacobi-like method for real matrices 
with complex conjugate eigenvalues. Numer. Math., 33:425-435, 1979. 

[29] J.H. Wilkinson. The Algebraic Eigenvalue Problem. Oxford University Press, 
London, 1965. 

[30] J.H. Wilkinson. A note on the quadratic convergence of the cyclic Jacobi process. 
Numer. Math., 4:296-300, 1962. 


34 





